Supplementary Notebook 3B2


This dataset is shown in Figure 3B of the manuscript.

Datasets can be downloaded here

Data preparation is illustrated here

Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Read in data

To load and use 10x Genomics single cell RNA-seq data processed with Cell Ranger:
(The variable index can be reset by choosing a different column in gene.tsv)

adata=st.read(file_name='./filtered_gene_bc_matrices/matrix.mtx',
              file_feature='./filtered_gene_bc_matrices/genes.tsv',
              file_sample='./filtered_gene_bc_matrices/barcodes.tsv',
              file_format='mtx',workdir='./stream_result')    
adata.var.index = adata.var[1].values

If the Anndata object is already created, to run STREAM, please simply specify work directory:

st.set_workdir(adata,'./stream_result')

Read in metadata

Alternatively, the step can be divided into two step:
(if file_name is not specified, 'unknown' will be genereated for cell label and random color will be used for each label)

st.add_cell_labels(adata,file_name='./cell_label.tsv.gz')
st.add_cell_colors(adata,file_name='./cell_label_color.tsv.gz')

Calculate QC

Feature selection

Please check if the blue curve fits the points well. If not, please adjust the parameter 'loess_frac' (usually by lowering it) until the blue curve fits well.

Dimensional reduction

Alternatively, using variable genes as features: st.dimension_reduction(adata,method='se',feature='var_genes',n_neighbors=15, n_components=4)

Visualize cells on 2D plane when n_components>=3 in st.dimension_reduction()

Trajectory inference

epg_alpha, epg_mu, epg_lambda are the three most influential parameters for learning elastic principal graph.

'epg_trimmingradius' can help get rid of noisy points (by defalut epg_trimmingradius=Inf)
e.g. st.elastic_principal_graph(adata,epg_trimmingradius=0.1)

Adjusting trajectories (optional)

Trajectory visualization

flat tree
stream plot at single cell level
stream plots

Some useful parameters to finetune the appearance of stream plots:

Marker genes detection

marker_list defines the list of genes to scan. If not specified, by default it uses all available genes. It might be time-consuming.

Here we only include variable genes.

1) detect marker genes for each leaf branch
2) detect transition genes for each branch
3) detect marker genes that are differentially expressed between pairs of branches
4) detect cell population-specific markers
st.detect_markers(adata,ident='label',marker_list=adata.uns['var_genes'],cutoff_zscore=1.0,cutoff_pvalue=0.01)

Save results

To read back the saved .pkl file

adata = st.read('./stream_result/stream_result.pkl')
Additionally, STREAM analysis result can be also saved

Depending on the desired annotations and genes to be visualized by the user, the following command can be executed to create singlecellVR-compatible .JSON object.

scvr -f stream_result.pkl -t STREAM -a ANNOTATIONS [-g GENES] [-o OUTPUT]